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ABSTRACT 

In two recent papers, we developed a powerful technique to link the distribution of 
galaxies to that of dark matter haloes by considering halo occupation numbers as func- 
tion of galaxy luminosity and type. In this paper we use these distribution functions 
to populate dark matter haloes in high-resolution iV-body simulations of the standard 
ACDM cosmogony with f2 TO = 0.3, fl\ = 0.7, and as = 0.9. Stacking simulation boxes 
of 100ft -1 Mpc and 300ft- -1 Mpc with 512 3 particles each we construct Mock Galaxy 
Redshift Surveys out to a redshift of z = 0.2 with a numerical resolution that guaran- 
tees completeness down to 0.01L*. We use these mock surveys to investigate various 
clustering statistics. The predicted two-dimensional correlation function £,(r p ,ir) re- 
veals clear signatures of redshift space distortions. The projected correlation functions 
for galaxies with different luminosities and types, derived from £(r p , 7r), match the ob- 
servations well on scales larger than ~ 3/i _1 Mpc. On smaller scales, however, the model 
overpredicts the clustering power by about a factor two. Modeling the "finger-of-God" 
effect on small scales reveals that the standard ACDM model predicts pairwise veloc- 
ity dispersions (PVD) that are ~ 400 km s -1 too high at projected pair separations 
of ~ 1ft -1 Mpc. A strong velocity bias in massive haloes, with b vc \ = Cgai/cdm ~ 0.6 
(where <7 ga i and Cdm are the velocity dispersions of galaxies and dark matter particles, 
respectively) can reduce the predicted PVD to the observed level, but does not help to 
resolve the over-prediction of clustering power on small scales. Consistent results can 
be obtained within the standard ACDM model only when the average mass-to-light 
ratio of clusters is of the order of 1000 (M/L) Q in the B-band. Alternatively, as we 
show by a simple approximation, a ACDM model with as — 0.75 may also reproduce 
the observational results. We discuss our results in light of the recent WMAP results 
and the constraints on as obtained independently from other observations. 

Key words: dark matter - large-scale structure of the universe - galaxies: haloes - 
methods: statistical 



1 INTRODUCTION 

The distribution of galaxies contains important information 
about the large scale structure of the matter distribution. On 
large, linear scales the galaxy power spectrum is believed to 
be proportional to the matter power spectrum, therewith 
providing useful information regarding the initial conditions 
of structure formation, i.e., regarding the power spectrum 
of primordial density fluctuations. On smaller, non-linear 
scales the distribution and motion of galaxies is governed 
by the local gravitational potential, which is cosmology de- 
pendent. One of the main goals of large galaxy redshift sur- 
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veys is therefore to map the distribution of galaxies as ac- 
curately as possible, over as large a volume as possible. The 
Sloan Digital Sky Survey (SDSS; York et al. 2000) and the 
2 degree Field Galaxy Redshift Survey (2dFGRS; Colless 
et al. 2001) are two of the prime examples. These surveys, 
which are currently being completed, will greatly enhance 
and improve our knowledge of large-scale structure and will 
become the standard data sets against which to test our 
cosmological and galaxy formation models for the decade to 
come. 

However, two effects complicate a straightforward inter- 
pretation of the data. First of all, the distribution of galaxies 
is likely to be biased with respect to the underlying mass 
density distribution. This bias is an imprint of various com- 
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plicated physical processes related to galaxy formation such 
as gas cooling, star formation, merging, tidal stripping and 
heating, and a variety of feedback processes. In fact, it is 
expected that the bias depends on scale, redshift, galaxy 
type, galaxy luminosity, etc. (Kauffmann, Nusser & Stein- 
metz 1997; Jing, Mo & Borner 1998; Somerville et al. 2001; 
van den Bosch, Yang & Mo 2003). Therefore, in order to 
translate the observed clustering of galaxies into a measure 
for the clustering of (dark) matter, one needs to either un- 
derstand galaxy formation in detail, or use an alternative 
method to describe the relationship between galaxies and 
dark matter (haloes). One of the main goals of this pa- 
per is to advocate one such method and to show its poten- 
tial strength for advancing our understanding of large scale 
structure. 

Secondly, because of the peculiar velocities of galaxies, 
the clustering of galaxies observed in redshift-space is dis- 
torted with respect to the real-space clustering (e.g., Davis 
& Peebles 1983; Kaiser 1987; Regos & Geller 1991; van de 
Weygaert & van Kampen 1993; Hamilton 1992). On small 
scales, the virialized motion of galaxies within dark mat- 
ter haloes smears out structure along the line-of-sight (i.e., 
the so-called "finger-of-God" effect). On large scales, coher- 
ent flows induced by the gravitational action of large scale 
structure enhance structure along the line-of-sight. Both ef- 
fects cause an anisotropy in the two-dimensional, two-point 
correlation function £(r p ,7r), with r p and ir the pair sepa- 
rations perpendicular and parallel to the line-of-sight, re- 
spectively. The large-scale flows compress the contours of 
£(r p ,7r) in the tv direction by an amount that depends on 
P = f2^ 6 /6. The small-scale peculiar motions implies that 
£(r p ,7r) is convolved in the 7r-direction by the distribution 
of pairwise velocities, /(V12). Thus, the detailed structure of 
£(r p , 7r) contains information regarding the Universal matter 
density Q m , the (linear) bias of galaxies 6, and the pairwise 
velocity distribution /(V12). 

From the above discussion it is obvious that understand- 
ing galaxy bias is an integral part of understanding large 
scale structure. One way to address galaxy bias without a 
detailed theory for how galaxies form is to model halo occu- 
pation statistics. One simply specifies halo occupation num- 
bers, (N(M)}, which describe how many galaxies on average 
occupy a halo of mass M. Many recent investigations have 
used such halo occupation models to study various aspects of 
galaxy clustering (Jing, Mo & Borner 1998; Peacock & Smith 
2000; Seljak 2000; Scoccimarro et al. 2001; White 2001; Jing, 
Borner & Suto 2002; Bullock, Wechsler & Somerville 2002; 
Berlind & Weinberg 2002; Scranton 2002; Kang et al. 2002; 
Marinoni & Hudson 2002; Zheng et al. 2002; Kochanek et 
al. 2003). In two recent papers, Yang, Mo & van den Bosch 
(2003; hereafter Paper I) and van den Bosch, Yang & Mo 
(2003; hereafter Paper II) have taken this halo occupation 
approach one step further by considering the occupation 
as a function of galaxy luminosity and type. They intro- 
duced the conditional luminosity function (hereafter CLF) 
<&(L\M)AL, which gives the number of galaxies with lumi- 
nosities in the range L ± dL/2 that reside in haloes of mass 
M. The advantage of this CLF over the halo occupation 
function (N(M)) is that it allows one to address the clus- 
tering properties of galaxies as function of luminosity. In ad- 
dition, the CLF yields a direct link between the halo mass 
function and the galaxy luminosity function, and allows a 



straightforward computation of the average luminosity of 
galaxies residing in a halo of given mass. Therefore, $(L|M) 
is not only constrained by the clustering properties of galax- 
ies, as is the case with (N(M)}, but also by the observed LFs 
and the halo mass-to-light ratios. 

In Papers I and II we used the observed LFs and the 
luminosity- and type-dependence of the galaxy two-point 
correlation function to constrain the CLF in the standard 
ACDM cosmology. In this paper, we use this CLF to pop- 
ulate dark matter haloes in high-resolution iV-body simula- 
tions. The 'virtual Universes' thus obtained are used to con- 
struct mock galaxy redshift surveys with volumes and appar- 
ent magnitude limits similar to those in the 2dFGRS. This 
is the first time that realistic mock surveys are constructed 
that (i) associate galaxies with dark matter haloes, (ii) are 
independent of a model for how galaxies form, and (iii) au- 
tomatically have the correct galaxy abundances and correla- 
tion lengths as function of galaxy luminosity and type. In the 
past, mock galaxy redshift surveys were constructed either 
by associating galaxies with dark matter particles (rather 
than haloes) using a completely ad hoc bias scheme (Cole et 
al. 1998), or by linking semi-analytical models for galaxy 
formation (with all their associated uncertainties) to the 
merger histories of dark matter haloes derived from numeri- 
cal simulations (Kauffmann et al. 1999; Mathis et al. 2002). 

We use our mock galaxy redshift survey to investigate 
a number of statistical measures of the large scale distribu- 
tion of galaxies. In particular, we focus on the two-point cor- 
relation function in redshift space, its distortions on small 
and large scales, and the galaxy pairwise peculiar veloci- 
ties. Where possible we compare our predictions with the 
2dFGRS and we discuss the sensitivity of these clustering 
statistics to several details regarding the halo occupation 
statistics. We show that the halo occupation obtained ana- 
lytically can reliably be implemented in TV-body simulations. 
We find that the standard ACDM model, together with the 
halo occupation we have obtained, can reproduce many of 
the observational results. However, we find significant dis- 
crepancy between the model predictions and observations on 
small scales. We show that to get consistent results on small 
scales, either the mass-to-light ratios for clusters of galaxies 
are significantly higher than normally assumed, or the linear 
power spectrum has an amplitude that is significantly lower 
than its 'concordance' value. 

This paper is organized as follows. In Section 2 we re- 
view the CLF formalism developed in papers I and II. Sec- 
tion 3 introduces the Af-body simulations and describes our 
method of populating dark matter haloes in these simula- 
tions with galaxies of different type and luminosity. Section 4 
investigates several clustering statistics in real-space and fo- 
cuses on the accuracy with which mock galaxy distributions 
can be constructed using our CLF formalism. In Section 5 
we use these mock galaxy distributions to construct mock 
galaxy redshift surveys that are comparable in size with the 
2dFGRS. We extract the redshift-space two-point correla- 
tion function from this mock redshift survey, investigate its 
anisotropics induced by the galaxy peculiar motions, and 
compare our results to those obtained from the 2dFGRS by 
Hawkins et al. (2003). In section 6 we discuss possible ways 
to alleviate the discrepancy between model and observations 
on small scales, and we summarize our results in Section 7. 
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2 THE CONDITIONAL LUMINOSITY 
FUNCTION 

In Paper I we developed a formalism, based on the condi- 
tional luminosity function $(L|M), to link the distribution 
of galaxies to that of dark matter haloes. We introduced a 
parameterized form for <&(L\M) which we constrained using 
the LF and the correlation lengths as function of luminos- 
ity. In Paper II we extended this formalism by constructing 
separate CLFs for the early- and late-type galaxies. In this 
paper we use these results to populate dark matter haloes, 
obtained from large numerical simulations, with both early- 
and late-type galaxies of different luminosities. For com- 
pleteness, we briefly summarize here the main ingredients of 
the CLF formalism, and refer the reader to papers I and II 
for more details. 

The conditional luminosity function is parameterized by 
a Schechter function: 



$(L|Af)dL = |- (j^Y exp(-L/Z*)dL, 



(1) 



where L* = L*(Af), a = a(M) and 4>* = <l*(Af) are all 
functions of halo mass Aft. Following Papers I and II, we 
write the average total mass-to-light ratio of a halo of mass 
Af as 



M \ 1 ( M \ ( M \~ 71 ( M 

t)m = 2 (t)„ {wj + U 



(2) 



which has four free parameters: a characteristic mass Afi, 
for which the mass-to-light ratio is equal to (M/L)o, and 
two slopes, 7i and 72, that specify the behavior of (Af/L) 
at the low and high mass ends, respectively. A similar pa- 
rameterization is adopted for the characteristic luminosity 
L*(Af): 



Af 



L*(M) 
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(3) 



(4) 



Here T(x) is the Gamma function and P(a, a:) the incomplete 
Gamma function. This parameterization has two additional 
free parameters: a characteristic mass M2 and a power-law 
slope 73. For a(Af) we adopt a simple linear function of 
log(M), 



&{M) = Q15 + n log(Afi B ), 



(5) 



with M15 the halo mass in units of 10 15 /i _1 Mq, cms = 
a(Afi5 = 1), and r\ describes the change of the faint-end 
slope a with halo mass. Note that once a(Af) and L*(Af) 
are given, the normalization of the conditional LF, <I>*(M), 
is obtained through equations (1) and (2), using the fact 
that the total (average) luminosity in a halo of mass Af is 

/>oo 

(L){M)= I $(L|M)LdL = i>*L*r(fi + 2). (6) 
Jo 

Finally, we introduce the mass scale Af m i n below which we 

t Halo masses are defined as the masses within the radius rigo 
inside of which the average overdensity is 180. 



set the CLF to zero; i.e., we assume that no stars form inside 
haloes with Af < Af m i n . Motivated by reionization consider- 
ations (see Paper I for details) we adopt Af m i n = 10 9 ft _1 M 
throughout. 

In order to split the galaxy population in early and 
late types, we follow Paper II and introduce the function 
/iatc(L, Af), which specifies the fraction of galaxies with lu- 
minosity L in haloes of mass Af that are late-type. The CLFs 
of late- and early-type galaxies are then given by 

$ latc (L|Af)dL = / lato (L, Af) $(L|Af)dL (7) 

and 

<E> cariy (L|Af) dL = [1 - /iate(L, Af)] $(L|Af ) dL . (8) 

As with the CLF for the entire population of galaxies, 
$i atc (L|Af) and <J> oar i y (L| Af ) are constrained by 2dFGRS 
measurements of the LFs and the correlation lengths as 
function of luminosity. We assume that /i ate (L, Af) has a 
quasi-separable form 



/ late (L,Af)= S (L) ft(Af) q(L, Af). 
Here 



q(L,M) = 



1 if g(L) h(M) < 1 

Krr L I?y iig(L)h(M)>l 



is to ensure that /i ate (L, Af) < 1. We adopt 
4>iatc(L) r °°*(L|Af)n(Af)dAf 



g(L) 



4>(L) J* °° $(L|M ) h(M ) n(Af) dAf 



(9) 



(10) 



(11) 



where n(Af) is the halo mass function (Sheth & Tormen 
1999; Sheth, Mo & Tormen 2001), 6i ate (L) and 4>(L) corre- 
spond to the observed LFs of the late-type and entire galaxy 
samples, respectively, and 



h(M) = max I 0, min 



log(Af/Af a ) 
log(Af 6 /Af a ) 



(12) 



with Af a and Aft two additional free parameters, defined 
as the masses at which h(M) takes on the values and 
1, respectively. As shown in Paper II, this parameteriza- 
tion allows the population of galaxies to be split in early- 
and late-types such that their respective LFs and clustering 
properties are well fitted. 

In Papers I and II we presented a number of different 
CLFs for different cosmologies and different assumptions re- 
garding the free parameters. In what follows we focus on 
the flat ACDM cosmology with Q m — 0.3, £7a = 0.7 and 
h = ff /(100 kms -1 Mpc -1 ) = 0.7 and with initial density 
fluctuations described by a scale-invariant power spectrum 
with normalization as = 0.9. These cosmological parameters 
are in good agreement with a wide range of observations, 
including the recent WMAP results (Spergel et al. 2003), 
and in what follows we refer to it as the "concordance" 
cosmology. Finally, we adopt the CLF with the following 



parameters: Afi 



10 1 



Hr 1 M Q , Af 2 



10 1 



'h- 1 M , 



Af a = lO 17 ' 26 ^ 1 M Q , M b = lO 10 ' 86 ^ 1 M Q , (Af/L) = 
124ft (M/L) Q , 71 = 2.02, 72 = 0.30, 73 = 0.72, r, = -0.22 
and aiB = —1.10. This model (referred to as model D in 
Paper II) yields excellent fits to the observed LFs and the 
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observed correlation lengths as function of both luminosity 
and type-f 

3 POPULATING HALOES WITH GALAXIES 
3.1 Numerical Simulations 

The main goal of this paper is to use the CLF described in 
the previous Section to construct mock galaxy redshift sur- 
veys, and to study a number of statistical properties of these 
distributions that can be compared with observations from 
existing or forthcoming redshift surveys. The distribution of 
dark matter haloes is obtained from a set of large iV-body 
simulations (dark matter only). The set consists of a total 
of six simulations with N = 512 particles each, that have 
been carried out on the VPP5000 Fujitsu supercomputer of 
the National Astronomical Observatory of Japan with the 
vectorized-parallel P 3 M code (Jing & Suto 2002). Each sim- 
ulation evolves the distribution of the dark matter from an 
initial redshift of z — 72 down to z = in a ACDM 'concor- 
dance' cosmology. All simulations consider boxes with peri- 
odic boundary conditions; in two cases Lbox — 100ft -1 Mpc 
while the other four simulations all have L box = 300ft" 1 Mpc. 
Different simulations with the same box size are completely 
independent realizations and are used to estimate errors due 
to cosmic variance. The particle masses are 6.2 x 10 8 ft _1 M Q 
and 1.7x lO lo ft _1 M0 for the small and large box simulations, 
respectively. One of the simulations with L box = lOOft^Mpc 
has previously been used by Jing & Suto (2002) to derive a 
triaxial model for density profiles of CDM haloes, and we re- 
fer the reader to that paper for complementary information 
about the simulations. In what follows we refer to simula- 
tions with Lbox = 100ft _1 Mpc and Lbox = 300ft _1 Mpc as 
Lioo and L300 simulations, respectively. 

Dark matter haloes are identified using the standard 
friends-of-friends (FOF) algorithm (Davis et al. 1985) with 
a linking length of 0.2 times the mean inter-particle separa- 
tion. Haloes obtained with this linking length have a mean 
overdensity of ~ 180 (Porciani, Dekel & Hoffman 2002), in 
good agreement with the definition of halo masses used in 
our CLF analysis. For each individual simulation we con- 
struct a catalogue of haloes with 10 particles or more, for 
which we store the mass (number of particles), the position 
of the most bound particle, and the halo's mean velocity 
and velocity dispersion. Note that the FOF algorithm can 
sometimes select poor systems (those with small number of 

•f Note that the parameters listed here are slightly different from 
those given in the orignal version of Paper II, as they are based on 
a corrected version of the galaxy luminosity function. As shown 
in Paper I, a change in the overall amplitude of the luminosity 
function in the fitting has some effect on the best-fit values of 
the correlation lengths. This is due to the combination of the 
following two effects. First, our model assumes a fixed mass-to- 
light ratio for massive haloes and so a change in the amplitude of 
the luminosity function leads to a change in the relative number 
of galaxies in small/large haloes. Second, although the correlation 
length as a function of luminosity was used as input in our fitting 
of the conditional luminosity function, there is some freedom for 
the 'best-fit' values of the correlation lengths to change in the 
fitting, because the errorbars on the observed correlation lengths 
are quite large. 



particles) that are spurious and have abnormally large ve- 
locity dispersions. We therefore have made a check to make 
sure that the particles assigned to a system according to the 
FOF algorithm are gravitationally bound. Our test showed 
that this correction is important only for low-mass haloes, 
and that it has almost no effect on our results. The left panel 
of Fig. 1 plots the z = halo mass functions for one of the 
Lioo simulations and for one of the L300 simulations (his- 
tograms), with all spurious haloes erased. For comparison, 
we also plot (solid line) the analytical halo mass function 
given in Sheth & Tormen (1999) and Sheth, Mo & Tormen 
(2001)^. The agreement is remarkably good, both between 
the two simulations and between the simulation results and 
the theoretical prediction. 

Note that our choice for box sizes of 100ft _1 Mpc and 
300ft" 1 Mpc is a compromise between high mass resolution 
and a sufficiently large volume to study the large-scale struc- 
ture. The impact of mass resolution is apparent from con- 
sidering the conditional probability function 

P(M\L)AM = ^^n{M)AM, (13) 

(see Paper I), which gives the probability that a galaxy 
of luminosity L resides in a halo with mass in the range 
M ± dM/2. The right panel of Fig. 1 plots this probabil- 
ity distribution obtained from the CLF given in Section 2 
for four different luminosities: L = L*/100, L — L*/10, 
L = L* , and L — 10 L*. Whereas 10L* galaxies are typ- 
ically found in haloes with 10 13 ft _1 <, M <, 10 15 ft _1 M©, 
galaxies with L = L*/100 ~ 10 8 ft~ 2 L Q typically reside in 
haloes of M ~ 5 x 10 10 ft _1 M Q . Comparing these probability 
distributions with the halo mass functions in the left panel, 
we see that the L300 simulations can only yield a complete 
galaxy distribution down to L ~ 0.4L*. The Lioo simula- 
tion, however, resolves dark matter haloes down to masses 
of 10 10 ft _1 Mq, which is sufficient to model the galaxy pop- 
ulation down to L ~ 0.01L*. On the other hand, luminous 
galaxies may be under-represented in this small box simula- 
tion, because it contains fewer massive haloes than expected. 
Combining these two sets of simulations, however, will en- 
able us to study the clustering properties of galaxies covering 
a sufficiently large volume and a sufficiently large range of 
luminosities. 



3.2 Halo Occupation Numbers 

When populating haloes with galaxies based on the CLF 
one first needs to choose a minimum luminosity. Based on 
the mass resolution of the simulations we adopt L m i n = 
10 7 ft -2 Lq throughout. The mean occupation number of 
galaxies with L > L m i n for a halo with mass M then fol- 
lows from the CLF according to: 

$(L|M)dL. (14) 

In order to Monte-Carlo sample occupation numbers for in- 
dividual haloes one requires the full probability distribution 

§ This same mass function is used in the CLF analysis described 
in Section 2. 
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Figure 1. The left-hand panel plots the halo mass functions of the numerical simulations discussed in the text (histograms). The 
mass function with a low mass cut at about 2 X lO 11 ^ -1 Mq corresponds to a simulation with Lbox = 300/1" 1 Mpc, while the other 
corresponds to a Lioo simulation with Lbox = lOO/i -1 Mpc. The solid curve is the Sheth, Mo & Tormen (2001) mass function which 
is shown for comparison. Note the excellent agreement, both between the two simulations and between the simulation results and the 
theoretical prediction. The right-hand panel plots the conditional probability distributions P(M\L) for four different luminosities as 
indicated. L* = 1.1 X 10 10 h~ 2 Lq is the characteristic luminosity of the Schechter fit to the 2dFGRS LF of Madgwick ct al. (2002). 
Combining these conditional probability distributions with the halo mass functions shown in the left-hand panel gives an indication of 
the completeness level that can be obtained with both the Lioo and L300 simulations (see text). 



P(N\M) (with N an integer) of which (N(M)) gives the 
mean, i.e., 

(N(M)) = ^N P(N\M) (15) 

N=0 

As a simple model we adopt 

C N' + 1 — (N(M)) if N = N' 
P(N\M) = i (N(M))-N' if AT = AT' + 1 (16) 
[ otherwise 

Here N' is the largest integer smaller than (N(M)). Thus, 
the actual number of galaxies in a halo of mass M is ei- 
ther N' or N' + 1. This particular model for the distribution 
of halo occupation numbers is supported by semi-analytical 
models and hydrodynamical simulations of structure forma- 
tion (Benson et al. 2000; Berlind et al. 2003) which indicate 
that the halo occupation probability distribution is narrower 
than a Poisson distribution with the same mean. In addition, 
distribution (16) is successful in yielding power-law correla- 
tion functions, much more so than for example a Poisson 
distribution (Benson et al. 2000; Berlind & Weinberg 2002). 



3.3 Assigning galaxies their luminosity and type 

Since the CLF only gives the average number of galaxies 
with luminosities in the range L ± dL/2 in a halo of mass 
M, there are many different ways in which one can assign 
luminosities to the Ni galaxies of halo i, and yet be consis- 
tent with the CLF. The simplest approach would be to sim- 



ply draw Ni luminosities (with L > L m i n ) randomly from 
<E>(LjM). Alternatively, one could use a more deterministic 
approach, and, for instance, always demand that the j th 
brightest galaxy has a luminosity in the range [Lj,Lj-i]. 
Here Lj is defined such that a halo has on average j galax- 
ies with L > Lj, i.e., 



$(L|M)dL = j. 



(17) 



We adopt an intermediate approach in most of our dis- 
cussion, giving special treatment only to the one bright- 
est galaxy per halo. The luminosity of this so-called "cen- 
tral" galaxy, L c , is drawn from 3>(L|M) with the restriction 
L > Li and thus has an expectation value of 



(Lc{M)) 



J Li 



*(L|M) LdL = -I* L* r(a + 2, Li/L*),(T8) 



The remaining Ni — 1 galaxies are referred to as "satel- 
lite" galaxies and are assigned luminosities in the range 
Lmin < L < Li, again drawn from the distribution func- 
tion $(L|M). In Section 4.2, we test the effect of luminosity 
sampling by comparing the results obtained from all the 
three approaches. 

Finally, the galaxies are assigned morphological types 
as follows. For each galaxy with luminosity L in a halo of 
mass M we draw a random number 1Z in the range [0, 1]. If 
TZ < /i ate (L, M) then the galaxy is a late- type, otherwise an 
early-type. 
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Figure 2. Projected dark matter/galaxy distributions of a 100 X 100 X 10h~ 1 Mpc slice in one of the Lioo mock galaxy distributions. The 
panels show (clockwise from top-left) the dark matter particles, all galaxies (early plus late), early-type galaxies, and late-type galaxies. 
Galaxies are weighted by their luminosities. Note how the galaxies trace the large scale structure of the dark matter, and how early-type 
galaxies are more strongly clustered than late-type galaxies. 



3.4 Assigning galaxies their phase-space 
coordinates 

Once the population of galaxies has been assigned luminosi- 
ties and types, they need to be assigned a position within 
their halo as well as a peculiar velocity. The central galaxy is 
assumed to be located at the "center" of the corresponding 
dark halo, which we associate with the position of the most 
bound particle, and its peculiar velocity is set equal to the 
mean halo velocity (cf. Yoshikawa, Jing & Borner 2003). For 
the satellite galaxies we follow two different approaches. In 
the first, we assign the Ni — 1 satellites the positions and 
peculiar velocities of Ni — 1 randomly selected dark matter 
particles that are part of the FOF halo under consideration. 
This thus corresponds to a scenario in which satellite galax- 



ies are completely unbiased with respect to the density and 
velocity distribution of dark matter particles in FOF haloes. 
We refer to satellite galaxies populated this way as "FOF 
satellites" . 



We also consider a more analytical model for the satel- 
lite distribution. This allows us first of all to assess whether 
a simple analytical description can be found to describe the 
population of satellite galaxies, and secondly, provides us 
with a simple framework to investigate the sensitivity of 
various clustering statistics to details regarding the density 
and velocity bias of satellite galaxies. We assume that the 
number density distribution of satellite galaxies follows a 
NFW density distribution (Navarro, Frenk & White 1997) : 
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Figure 3. Same as Fig. 2, but for a 300 X 300 X 20h 1 Mpc slice taken from one of the L300 mock galaxy distributions. 



p(r) = 



5p 



(r/r s )(l + r/r s 



(19) 



where r s is a characteristic radius, p is the average density of 
the Universe, and 5 is a dimensionless amplitude which can 
be expressed in terms of the halo concentration parameter 
c = riso/r s as 



180 (f 

3 ln(l + c) - c/(l + c) ' 



(20) 



Here riso is the radius inside of which the halo has an av- 
erage overdensity of 180. Numerical simulations show that 
halo concentration depends on halo mass, and we use the 
relation given by Bullock et al. (2001), converted to the c 
appropriate for our definition of halo mass. The radial num- 
ber density distribution of satellite galaxies is assumed to 
follow equation (19) with a concentration c g = c, and the 
angular position is assumed to be random over the 4n solid 



angle. Peculiar velocities are assumed to be the sum of the 
peculiar (mean) velocity of the host halo plus a random ve- 
locity which is assumed to be distributed isotropically and 
to follow a Gaussian, one-dimensional velocity distribution: 



27TO": 



■ exp 



gal 



(21) 



Here Vj is the velocity relative to that of the central galaxy 
along axis j, and a ga i is the one-dimensional velocity dis- 
persion of the galaxies, which we set equal to that of the 
dark matter particles, (Jd m , in the halo under consideration. 
We refer to satellite galaxies populated this way as "NFW 
satellites" . 



© 0000 RAS, MNRAS 000, 000-000 



Yang, Mo, Jing, van den Bosch & Chu 



M b - 5 log h 
-14 -16 -18 -20 -22 



-2 - 



o 
a 
2 

CO 

X! 



.-. -4 - 



c 

T3 



■e- 

T3 

01) 
O 



-6 - 



-8 - 



t— r 



~i — i — r 



i i i 



i i i 



i i i 



~i — r 




L bo =100h->Mpc 



_L 



_L 



8 9 10 
log[L] (h"%) 



^ -4 




11 



M bj - 5 log h 
-14 -16 -18 -20 -22 



o 
a 
2 

CO 

X! 



-2 - 



c 
■e- 

X) 

tin 
o 



-6 - 



-8 - 



~i — i — r 



i i i 



i i i 



L box = 300h " 1M P C 



_L 



1 



8 9 10 
log[L] (h"2L ) 



t r 




11 



Figure 4. The luminosity functions of the mock galaxies constructed from the Lioo (left) and L300 (right) halo catalogues (solid lines). 
For comparison, we also plot the LFs obtained by Madgwick ct al. (2002) for all galaxies (circles), for late- type galaxies (triangles) 
and for early- type galaxies (stars). For clarity, the latter two LFs have been shifted down by one and two orders of magnitude in the 
{/-direction, respectively. Except for incompleteness effects due to the sampling of the halo mass function (see text for details), the mock 
galaxy distributions have LFs that are in excellent agreement with the data. 



4 RESULTS IN REAL SPACE 

Fig. 2 and 3 show slices of mock galaxy distributions (here- 
after MGDs) constructed from L100 and L300 simulations, 
respectively. Satellite galaxies are assigned positions and ve- 
locities using the NFW scheme outlined above. Results are 
shown for all galaxies (upper right panels), and separately 
for early types (lower right panels) and late types (lower left 
panels). For comparison, we also show the distribution of 
dark matter particles in the upper left panels. Note how the 
large scale structure in the dark matter distribution is de- 
lineated by the distribution of galaxies, and that early-type 
galaxies are more strongly clustered than late-type galaxies. 

In this section we discuss the general, real-space prop- 
erties of these MGDs. In Section 5 below we construct mock 
galaxy redshift surveys to investigate the impact of redshift 
distortions. The main goal of this section, however, is to in- 
vestigate with what accuracy the combination of numerical 
simulations and our CLF analysis can be used to construct 
self-consistent mock galaxy distributions. In particular, we 
want to examine to what accuracy these MGDs can recover 
the input used to constrain the CLFs. Note that this is not 
a trivial question. The CLF modeling is based on the halo 
model, which only yields an approximate description of the 
dark matter distribution in the non-linear regime (see dis- 
cussions in Cooray & Sheth 2002 and Huffenberger & Seljak 
2003). In addition, as described in Section 3, the CLF alone 
does not yield sufficient information to construct MGDs, 
and we had to make additional assumptions regarding the 
distribution of galaxies within individual haloes. A further 



goal of this section, therefore, is to investigate how these 
assumptions impact on the clustering statistics. 



4.1 The luminosity function 

The CLFs used to construct the MGDs shown in Fig. 2 
and 3 are constrained by the 2dFGRS luminosity functions 
for early- and late-type galaxies obtained by Madgwick et 
al. (2002). Therefore, as long as the halo mass function 
is well sampled by the simulations, the LFs of our MGDs 
should match those of Madgwick et al. (2002). Fig. 4 shows 
a comparison between the 2dFGRS LFs (symbols with er- 
rorbars) and the ones recovered from the MGDs (solid lines) . 
To emphasize the level of agreement between the recov- 
ered LFs and the input LFs, Fig. 5 plots the ratio between 
the two. Over a large range of luminosities, the recovered 
LFs match the observational input extremely well. In the 
L300 simulation, however, the LFs are under-estimated for 
L <, 3 x 10 9 /2T 2 L Q (M bj - 5 log h ^ -18.4). This owes 
to the absence of haloes with M <J 2 x 10 11 ft _1 M Q (see 
Fig. 1). Note how this discrepancy sets in at higher L for 
the late-type galaxies than for the early-types, because the 
latter are preferentially located in more massive haloes. For 
the early-types the L300 mock is virtually complete down to 
M b j - 5 log h ~ -17 (see Fig. 10 of Paper II), reflecting the 
fact that only a very small fraction of the early-type galaxies 
brighter than this magnitude reside in haloes below the mass 
resolution limit. In the L100 simulations, on the other hand, 
the LFs accurately match the data down to the faintest lu- 
minosities, but here the MGD underestimates the LFs at 
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Figure 6. Two point correlation functions for dark matter particles (left panel) and mock galaxies (right panel). The dotted and dashed 
lines correspond to results from the L300 and L100 simulations, respectively. The solid line in the left panel corresponds to the evolved, 
non-linear correlation function for the dark matter obtained by Smith et al. (2003), and is shown for comparison. Due to the limited 
box-sizes, the T300 (^loo) simulations slightly over (under) predict the correlation power on large scales with respect to Smith et al.'s 
model. The 2PCFs in the right panel are calculated for galaxies with absolute magnitudes Mj, 7 — 51og/i < —18.4, which corresponds to 
the completeness limit of the L300 MGDs. Note that the box size also affects the 2PCFs of the mock galaxies on large scales. Errorbars 
are the variance among the two (L100) and four (L300) independent realizations. 



the bright end {Mbj — 5 log h 5s —22). This owes to the lim- 
ited boxsize, which causes the number of massive haloes (the 
main hosts of the brightest galaxies) to be underestimated 
(cf. Fig. 1). Note that even the LFs of the L300 simulations 
underestimate the observed number of bright galaxies. This, 
reflects a small inaccuracy of our CLF to accurately match 
the observed bright end of the LFs (see paper II). 

4.2 The real-space correlation function 

In addition to the LFs of early- and late-type galaxies, 
the CLFs used here to construct our MGDs are also con- 
strained by the luminosity and type dependence of the cor- 
relation lengths as measured from the 2dFGRS by Norberg 
et al. (2002a). Here we check to what degree this "input" is 
recovered from the MGDs. 

The left panel of Fig. 6 plots the real-space two-point 
correlation functions (2PCFs) for dark matter particles in 
the L100 (dashed line) and L300 (dotted line) simulations. 
The solid line corresponds to the evolved, non-linear dark 
matter correlation function of Smith et al. (2003) and is 
shown for comparison^. As one can see, on large scales 
(r <; 6fe _1 Mpc) the correlation amplitude obtained from 
the L100 simulations is systematically lower than both that 
obtained from the L300 simulations and that obtained from 

In fitting the CLF we have used this function to compute the 
correlation length of the dark matter (see Paper II). 



the fitting formula of Smith et al. , suggesting that the box- 
size effect is non-negligible in the L100 simulations. Note also 
that the large scale correlation amplitude given by the L300 
simulations is slightly higher than Smith et al. 's model. 
It is unclear if this discrepancy is due to the inaccuracy of 
the fitting formula, or due to cosmic variance in the present 
simulations. As we will see below, this discrepancy limits the 
accuracy of model predictions. 

The right-hand panel of Fig. 6 plots the 2PCFs for the 
galaxies in the L100 (dashed line) and L300 (dotted line) 
MGDs. Note how the galaxies reveal the same trend on large 
scales as the dark matter particles, with larger correlations 
in the L300 than in the L100 MGD. 

Fig. 7 shows the correlation lengths ro as function of 
luminosity for all (upper panel), early- type (middle panel) 
and late-type (lower panel) galaxies. These have been ob- 
tained by fitting £(r) with a power law relation of the form 
£(r) = (r/r )~ 7 over the same range of scales as used by 
Norberg et al. (2002a). Solid squares and open stars cor- 
respond to correlation lengths obtained from the L300 and 
L100 MGDs, respectively. Note that the errobars on the pre- 
dicted correlation lengths are based on the scatter among in- 
dependent simulations boxes. They are significantly smaller 
than the errorbars on the observational data, because the 
model predictions are based on real-space correlation func- 
tions, while the observational results are based on projected 
correlation functions in redshift space. The agreement with 
the data (open circles) is reasonable, even though several 
systematic trends are apparent. In particular, the correla- 
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Figure 5. The ratio of the luminosity function of mock galax- 
ies, <&mock(L), to that of the 2dFGRS, *2dFGRS (■£<) (taken from 
Madgwick et al. 2002). The thin errorbars indicate the errors on 
5 > 2dFGRs(^')- The thick solid (dashed) lines correspond to the 
LFs obtained from the Lioo (£300) simulations. The errorbars 
for the mock galaxies are obtained from the l-cr variance of the 
two L100 and the four L300 simulations, respectively. See text for 
discussion. 



tion lengths obtained from the L300 simulation are slightly 
higher than the observations while the opposite applies to 
the Z/ioo simulation. These discrepancies are due to two ef- 
fects. First of all, as shown in Fig. 6 the dark matter on large 
scales is more strongly clustered in the Z/300 simulations than 
in the L100 simulations. That this can account for most of 
the differences between the scale-lengths obtained from the 
L300 and L100 simulations, is illustrated by the dotted and 
solid horizontal lines, which indicate the correlation lengths 
of the dark matter particles in the L300 and L100 simulations, 
respectively. Secondly, the measured correlation lengths cor- 
respond to a non-zero, median redshift which is larger for the 
more luminous galaxies. In determining the best-fit param- 
eters for the CLF this redshift effect is taken into account 
(see Papers I and II). However, in the construction of our 
MGDs, we only use the dark matter distribution at z = 0. 
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Figure 7. The real space correlation length, ro, as a function 
of galaxy luminosity and type. Top panel shows the results for 
the combined sample of early- plus late-type galaxies, while the 
middle (bottom) panel shows results for the early (late) type 
galaxies only. Solid squares and stars correspond to the corre- 
lation lengths obtained from the L300 and L100 simulations, re- 
spectively. The errorbars correspond to the 1-a variance from the 
two (four) independent realizations for L100 (^300)- We also indi- 
cate (open circles with errorbars) the correlation lengths obtained 
from the 2dFGRS by Norbcrg ct al. (2002a). In the upper panel, 
we also plot the correlation lengths for dark matter particles for 
L100 (solid line) and L300 (dotted line) simulations. Although the 
agreement between data and MGDs is reasonable there are small 
but significant differences. The reason for these discrepancies is 
discussed in the text. 



As discussed in Paper I, this can over-estimate the correla- 
tion length by about 10%. Given these sources of systematic 
errors, one should be careful not to over-interpret any dis- 
crepancy between the correlation lengths in the mock survey 
and those obtained from real redshift distributions. 

In order to investigate the sensitivity of the 2PCF in 
the MGDs to the way we assign luminosities and phase- 
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Figure 8. The ratio of the 2PCF £(r) in three MGDs compared 
to that of our fiducial MGD. The only difference among these 
various MGDs is the way that we assign luminosities and phase- 
space coordinates to the galaxies. Solid (dotted) lines correspond 
to a MGD in which we use a deterministic (random) method to 
assign galaxies their luminosities (see Section 3.3 for definitions). 
In the MGD corresponding to the dashed line we use the inter- 
mediate, fiducial method to assign luminosities, but here we use 
'FOF satellites' rather than 'NFW satellites' (see Section 3.4 for 
definitions) . Results are shown for galaxies in three different mag- 
nitude bins (as indicated) in one of the L300 simulations. However, 
results for the L100 simulations look virtually identical. 



space coordinates to the galaxies within the dark matter 
haloes, we construct MGDs using one of the L300 simula- 
tions with different models for the luminosity assignment 
and spatial distribution of satellite galaxies within haloes. 
We have confirmed that using one of the L100 simulations in- 
stead yields the same results. We first test the impact of the 
luminosity assignment. Here, instead of the fiducial model 
for the luminosity assignment (the intermediate approach 
discussed in Section 3.3), we use both the deterministic and 
random assignments (see Section 3.3 for definitions) to con- 
struct the MGDs. In Fig. 8 we shown the ratios between the 
correlation functions obtained from these MGDs and those 
obtained from the fiducial MGD. For bright galaxies, the 



deterministic model gives the lowest amplitudes on small 
scales (r ^ l/i -1 Mpc), while the random model gives the 
highest amplitudes. This is expected. The mean number of 
bright galaxies in a typical halo is not much larger than 1 
and so not many close pairs of bright galaxies are expected 
in the deterministic model. More such pairs are expected 
in the random model because more than one galaxies in a 
typical halo can be assigned a large luminosity due to ran- 
dom fluctuations. The dashed lines in Fig. 8 correspond to a 
MGD with FOF satellites (see Section 3.4). The agreement 
of the 2PCFs between this MGD with 'FOF satellites' and 
our fiducial MGD indicate that the spherical NFW model 
is a good approximation of the average density distribution 
of dark matter haloes. We have also tested the impact of 
changing the concentration of galaxies, c 9 ; increasing (de- 
creasing) c g with respect to the dark matter halo concen- 
tration, c, increases (decreases) the 2PCFs on small scales 
(r Si 1ft -1 Mpc). However, even when changing the ratio c g /c 
by a factor of two, the amplitude of this change is smaller 
than the differences resulting from changing the luminosity 
assignment. 

All in all, changes in the way we assign luminosities and 
phase-space coordinates to the galaxies only have a mild im- 
pact on the 2PCFs, and only at small scales 5s l/i -1 Mpc. 
This is in good agreement with Berlind & Weinberg (2002) 
who have shown that these effects are much smaller than 
changes in the second moment of the halo occupation dis- 
tributions. For example, assuming a Poissonian P(N\M), 
rather than equation (16) has a much larger impact on the 
2PCFs than any of the changes investigated above. As we 
show in Section 5 below, with the P(N\M) of equation (16) 
we obtain correlation functions that are in better agreement 
with observations, providing empirical support for this par- 
ticular occupation number distribution. 

It is interesting to note that although small changes in 
the way we assign luminosities and phase-space coordinate 
do not have a big impact on the statistical measurements we 
are considering here, such changes can lead to quite different 
results for other statistical measures. As shown in van den 
Bosch et al. (2004), various statistics of satellite galaxies 
around bright galaxies can be used to distinguish models 
that make similar predictions about the clustering on large 
scales. 



4.3 Pairwise velocities 

The peculiar velocities of galaxies are determined by the ac- 
tion of the gravitational field, and so are directly related to 
the matter distribution in the Universe. Observationally, the 
properties of galaxy peculiar velocities are inferred from dis- 
tortions in the correlation function. We defer this discussion 
to Section 5. Here we derive statistical quantities directly 
from the simulated peculiar velocities of galaxies. 

We define the pairwise peculiar velocity of a galaxy pair 



vi 2 (r) = [v(x + r) - v(x)] 



(22) 



with v(x) the peculiar velocity of a galaxy at x. The mean 
pairwise peculiar velocity and the pairwise peculiar velocity 
dispersion (PVD) are 



(«i 2 (r)} and cn 2 (r) = ([«i 2 (r) - (ui 2 (r)}] } 



2\l/2 



(23) 
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Figure 9. The mean pairwise velocities (upper panels) and pairwise velocity dispersions (lower panels) estimated from the three- 
dimensional (real-space) velocities of the mock galaxies and dark matter particles. All results correspond to the L300 simulations only. 
Left-hand panels compare dark matter particles (solid circles) with galaxies cither with NFW satellites (open circles) or with FOF 
satellites (open stars). Right-hand panels display the galaxy- type dependence for a model with NFW satellites (crrorbars indicate the 
rms-scatter among the four independent L300 simulations). See text for detailed discussion. 



where (• • •) denotes an average over all pairs of separation 
r. 

In order to gain insight, we compute (vi2(r)) and 012(7") 
from the L300 simulations for both dark matter particles and 
for galaxies with Mb , — 51ogft < —18.4 (which corresponds 
to the completeness limit of these simulations, see Fig. 4). 

Results are shown in Fig. 9. The upper left panel com- 
pares the mean pairwise peculiar velocities of the dark mat- 
ter particles (solid circles) with those of two realizations of 
the galaxies: one with 'NFW satellites' (open circles) and 
the other with 'FOF satellites' (stars). At sufficiently small 
separations, one probes the virialized regions of dark matter 
haloes, and one thus finds (V12) = 0. At larger separations, 
one starts to probe the infall regions around the virialized 
haloes, yielding negative values for (vi2{r)}. Finally, at suffi- 
ciently large separations (vi2(r)) — > due to the large scale 
homogeneity and isotropy of the Universe. 

Both the dark matter particles and the galaxies from 
our MGDs indeed reveal such a behavior, with (vi2(r)) peak- 
ing at ~ 3/i _1 Mpc. However, there is a markedly strong 
difference between the (vi2{r)} of galaxies in the MGD with 
NFW satellites and that of the dark matter. In this particu- 
lar MGD, the galaxies experience significantly smaller infall 



velocities than the dark matter particles. However, this dif- 
ference between dark matter and galaxies is almost absent in 
the MGD with FOF satellites. This is due to the fact that 
in the NFW model, we populate satellites with isotropic 
velocity dispersions within a sphere of radius nso- We are 
thus assuming that the entire region out to rrso is virial- 
ized in that there is no net infall. However, simple collapse 
models predict that for our concordance cosmology only the 
region out to r34o (i.e., the radius inside of which the aver- 
age overdensity is 340) is virialized (Bryan & Norman 1998). 
The difference between the MGDs with NFW satellites and 
FOF satellites indicates that the regions between r^o and 
rrso are still infalling, resulting in non-zero (1^2). 

In the lower-left panel, we compare the PVDs for galax- 
ies and dark matter particles. Here the MGDs with FOF 
satellites and NFW satellites are fairly similar, and signifi- 
cantly lower than for the dark matter. This can be under- 
stood as follows. At small separations, the PVD is a pair 
weighted measure for the potential well in which dark mat- 
ter particles (galaxies) reside. For the galaxies in our MGDs 
the halo occupation number per unit mass, N/M, decreases 
with the mass of dark matter haloes (see Paper II). There- 
fore, the massive haloes (with larger velocity dispersions) 
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Figure 10. Distribution of pairwise velocities, /(U12), for dark matter particles (solid curves), and for mock galaxies in the L300 
simulation. Results are shown for four separations r as indicated, and for all galaxies (dot-dashed lines), early- type galaxies (dotted lines) 
and late-type galaxies (dashed lines). On small scales (r < lh^ 1 Mpc) the pairwise velocity distributions are symmetric and reveal an 
obvious exponential form. On larger scales, however, /(«i2) reveals clear asymmetries: for V12 < the distribution is still close to an 
exponential, while for V12 > the distribution more resembles a normal distribution. 



contribute relatively less to the PVDs of galaxies. Although 
the difference between the 0-12(7") of the MGDs with FOF 
and NFW satellites shows that the PVDs have some de- 
pendence on the details regarding the infall regions around 
virialized haloes, these effects are typically small. 

The upper-right and lower-right panels of Fig. 9 show 
how (vi2(r)) and <Ji2(r) depend on galaxy type. Results are 
shown for the MGD based on NFW satellites. The mean 
velocities for early-type galaxies are larger than those for 
late-type galaxies on large scales, but smaller on small scales. 
In addition, the PVD of early-type galaxies is higher than 
that of late-type galaxies on all scales. All these differences 
are easily understood as a reflection of the fact that early- 
type galaxies are preferentially located in the larger, more 
massive haloes which have larger velocity dispersions and 
larger infall velocities. 

Fig. 10 shows the pairwise velocity distributions for 
four different separations r, within a logarithmic interval of 
Alogr = 0.125. On small scales, the distribution is well fit 
by an exponential for both dark matter particles and galax- 
ies. This validates the assumption made in earlier analyses 
about this distribution (Davis & Peebles 1983; Mo, Jing & 



Borner 1993; Fisher et al. 1994; Marzke et al. 1995). It is 
also consistent with earlier results obtained from theoreti- 
cal models and numerical simulations based on dark matter 
particles (Diaferio & Geller 1996; Sheth 1996; Mo, Jing & 
Borner 1997; Seto & Yokoyama 1998; Efstathiou et al. 1988; 
Magira, Jing & Suto 2000). For larger separations /(W12) is 
skewed towards negative values of V12, because galaxies tend 
to approach each other due to gravitational infall. Clearly, a 
single exponential function is no longer a good approxima- 
tion to the pairwise peculiar velocity distribution at large 
separations. Although for V12 < (infall) the exponential 
remains remarkably accurate, for V12 > the pairwise ve- 
locity distribution reveals a more Gaussian behavior. This 
may have important implications for the derivation of PVDs 
(especially at large separations), which typically is based on 
the assumption of a purely exponential /(W12). We shall re- 
turn to this issue in more detail in Section 5.2. 

5 RESULTS IN REDSHIFT SPACE 

The statistical quantities of galaxy clustering discussed in 
the previous section are based on real distances between 
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Figure 11. The stacking geometry of the Lioo and ^300 sim- 
ulation boxes used to construct the MSB mock galaxy redshift 
surveys. The virtual observer is located at the center of the stack, 
indicated by a thick solid dot. Note that for MGRSs in the MB 
set, the stack of 6 X 6 X 6 -Lioo boxes is replaced by a stack of 
2x2x2 L300 boxes. 

galaxies in our MGDs. However, because of the peculiar ve- 
locities of galaxies, such quantities cannot be obtained di- 
rectly from a galaxy redshift survey. On small scales the viri- 
alized motion of galaxies within dark matter haloes cause a 
reduction of the correlation power, while on larger scales the 
correlations are boosted due to the infall motion of galaxies 
towards overdensity regions (Kaiser 1987; Hamilton 1992). 
As discussed in the introduction, these distortions contain 
useful information about the Universal density parameter, 
the bias of galaxies on large (linear) scales, and the pairwise 
velocities of galaxies. 

In this section, we use the MGDs presented above 
to construct large mock galaxy redshift surveys (hereafter 
MGRSs). The main goals are to compare various clustering 
statistics from these mock surveys with observational data 
from the 2dFGRS, and to investigate how the details about 
the CLF and the distribution of galaxies within haloes im- 
pact on these statistics. For the model-data comparison we 
use the large scale structure analysis of Hawkins et al. (2003; 
hereafter H03), which is based on a subsample of the 2dF- 
GRS consisting of all galaxies located in the North Galactic 
Pole (NGP) and South Galactic Pole (SGP) survey strips 
with redshift 0.01 < z < 0.20 and apparent magnitude 
b.j < 19.3. This sample consists of ~ 166, 000 galaxies cov- 
ering an area on the sky of ~ 1090 deg 2 . 

In order to carry out a proper comparison between 
model and observation, we aim to construct MGRSs that 
have the same selections as the 2dFGRS. First of all, the 
survey depth of z max = 0.2 implies that we need to cover a 
volume with a depth of 6OO/1 -1 Mpc, i.e., twice that of our 
big L300 simulations. In principle, we could stack 4x4x4 



identical L300 boxes (which have periodic boundary condi- 
tions), so that a depth of 600ft -1 Mpc can be achieved in 
all directions for an observer located at the center of the 
stack. However, there is one problem with this set-up; as 
we have shown in Figs. 1 and 4 the L300 MGD is only 
complete down to Mt r — 51og/i ~ —18.4. Taking account 
of the apparent magnitude limit of the survey, this implies 
that our MGRSs would be incomplete out to a distance of 
~ 350/i -1 Mpc. We can overcome this problem by using the 
higher resolution L100 simulation, which is complete down to 
Mb j — 51og/i ~ —14. We therefore replace the central 2x2x2 
L300 boxes with a stack of 6 x 6 x 6 L100 boxes. The final lay- 
out of our virtual universe is illustrated in Figure 11. Unless 
stated otherwise, satellite galaxies are assigned to dark mat- 
ter haloes based on our standard NFW method described in 
Section 3.4. 

Observational selection effects, which are modelled ac- 
cording to the final public data release of the 2dFGRS (see 
also Norberg et al. 2002b), are taken into account using the 
following steps: 

(i) We place a virtual observer at the center of the stack of 
boxes (the solid dot in Figure 11), define a (a, <5)-coordinatc 
frame, and remove all galaxies that are not located in the 
areas equivalent to the NGP and SGP regions of the 2dF- 
GRS. 

(ii) Next, for each galaxy we compute the redshift as 
'seen' by the virtual observer. We take the observational 
velocity uncertainties into account by adding a random ve- 
locity drawn from a Gaussian distribution with dispersion 
85kms _1 (Colless et al. 2001). 

(iii) We compute the apparent magnitude of each galaxy 
according to its luminosity and distance. Since galaxies in 
the 2dFGRS were pruned by apparent magnitude before a 
k-correction was applied, we proceed as follows: We first ap- 
ply a negative k-correction, then select galaxies according to 
the position-dependent magnitude limit (obtained using the 
apparant magnitude limit masks provided by the 2dFGRS 
team), and finally k-correct the magnitudes back to their 
rest-frame &j-band. Throughout we use the type-dependent 
k-corrections given in Madgwick et al. (2002). 

(iv) To mimic the position- and magnitude-dependent 
completeness of the 2dFGRS, we randomly sample each 
galaxy using the completeness masks provided by the 2dF- 
GRS team. The incompleteness of the 2dFGRS parent sam- 
ple is taking into account by randomly discarding 9% of all 
mock galaxies (Norberg et al. 2002b). 

(v) Finally, we mimic the actual selection criteria of the 
2dFGRS sample used in H03 by restricting the sample to 
galaxies within the redshift range 0.01 < z < 0.20 and with 
completeness > 0.7. 

Each MGRS thus constructed contains, on average, 
169000 galaxies, with a dispersion of ~ 5000 due to cosmic 
variance. The number of galaxies in our mock catalogues are 
consistent with the observations at the la level. Note that 
the correlation functions presented by H03 have been cor- 
rected for the observational bias due to fiber collisions, and 
we therefore do not mimic these effects in our MGRSs. 

Since we have 2 -L100 simulations and 4 L300 simula- 
tions, we construct 2x4 = 8 mock catalogues with differ- 
ent combinations of small- and big-box simulations. In what 
follows, we refer to this set of mock catalogues as MSBs 
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Figure 12. The distribution of a sub-set of galaxies in one of the MSB mock samples. For clarity, we plot galaxies only in two 3-dcgrcc 
slices, one in the 'North Galactic Pole' region (NGP) and the other in the 'South Galactic Pole' region (SGP). Only galaxies with redshifts 
in the range 0.01 < z < 0.2 arc plotted. 



(for Mock Small/Big). As an example, Fig. 12 shows the 
distribution of a sub-set of galaxies in one of these mock 
catalogues. Although each of our MSB catalogs covers an 
extremely large volume, and should thus not be very sensi- 
tive to cosmic variance, it is constructed using simulations 
with box sizes of 100 and 300ft _1 Mpc only. If, for instance, 
the Lioo simulation contains a big cluster, the 6 x 6 x 6 re- 
production of this box in our MGRSs might introduce some 
unrealistic features. Furthermore, as shown in Section 4.2 
the Lioo box underestimates the amount of clustering on 
large scales. Therefore, this set of MGRSs, which replicate 
this box 27 times, might underestimate the clustering on 
large scales as well. In order to test the sensitivity of our 
results to these potential problems, and to have a better 
handle on the impact of cosmic variance in our mock sur- 
veys, we construct four alternative MGRSs. Each consists of 
a 4 x 4 x 4 stack of one of the four L300 simulations (i.e., we 
replace the 6x6x6 stack of Z/ioo boxes by a 2 x 2 x 2 stack 
of L300 boxes). In what follows we refer to this set of mock 
catalogues as MBs (for Mock Big). These MGRSs, although 
incomplete for M^j — 51og/i > —18.4, should not suffer from 
the lack of clustering power on large scales. The MSB set, 
on the other hand, does not suffer from incompleteness, but 
instead lacks some large scale power. As we will see below, 
both the MSB and MB mocks give similar results on large 
scales, suggesting that the box-size effect does not have a 
significant influence on our results. 



5.1 Two-Point Correlation Functions 

From our MGRSs we compute £(r p ,7r) using the estimator 
(Hamilton 1993) 



e(r , {RR){DD) 



(24) 



with (DD), (RR), and (DR) the number of galaxy-galaxy, 
random-random, and galaxy-random pairs with separation 
(r p , 7r). Here r p and n are the pair separations perpendicular 
and parallel to the line-of-sight, respectively. Explicitly, for 
a pair (si,s 2 ), with s ; = cZiii/Ho, we define 



si 



r v = y/s ■ S — 7T 2 



(25) 



Here 1 = |(si + S2) is the line of sight intersecting the pair, 
and s = si — S2. Random samples are constructed using 
two different methods. The first uses the mean galaxy num- 
ber density at redshift z calculated from the 2dFGRS LF. 
The second randomizes the coordinates of all mock galax- 
ies within the simulation box. Both methods yield indistin- 
guishable estimates of £(r p ,Tr) an d in what follows we only 
use the former. Following H03 each galaxy in a pair with 
redshift separation s is weighted by the factor 



1 



1 +4TYn(zi)J s (s) 



(26) 
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r p (h-'Mpc) r p (h^Mpc) r p (h-'Mpc) 

Figure 13. The upper panels show the projected 2PCFs w p (r p )/r p for galaxies of different luminosity and type. The errorbars correspond 
to the 1-cr variance among distinct MGRSs (i.e. among the 8 MSBs for the faintest subsamplcs, and among the 4 MBs for the brightest 
subsamples). For clarity, the error bars are only plotted for the brightest and faintest subsamplcs. The lower panels plot the ratios of 
these w p (r p ) to that of a reference sample. The reference sample contains all galaxies within the magnitude range —19.5 > M' b > —20.5 
(with M' b j = M\> j — 51og/i). Note that the faintest subsamples, which are impacted by the boxsize effect of the Lioo simulation, reveal 
a 'break' at r p ~ lO/i -1 Mpc. 



with n(z) the number density distribution as function of red- 
shift and Js(s) = J () s £(s')s' 2 ds' . Hence each galaxy-galaxy, 
random-random, and galaxy-random pair is given a weight 
WiWj. We substitute £(s') with a power law using the same 
parameters as in H03. This redshift dependent weighting 
scheme is designed to minimize the variance on the esti- 
mated correlation function (Davis & Huchra 1982; Hamilton 
1993). 

Since the redshift-space distortions only affect it, the 
projection of £(r p ,iv) along the -k axis can get rid of these 
distortions and give a function that is more closely related 
to the real-space correlation function. In fact, this projected 
2PCF is related to the real-space 2PCF through a simple 
Abel transform 



/OO POO 
(,(r p ,n)dn = 2 £( 
-oo J r„ 



r) 



r dr 



(27) 



(Davis & Peebles 1983). Therefore, if the real-space 2PCF 
is a power-law, £(r) = (ro/r) 7 , the projected 2PCF w(r p ) 
can be written as 



w P [r p )-V* r(7/2) I ] r p . 



(28) 



We start our investigation of the redshift-space cluster- 
ing properties by computing w p {r p ) for a number of lumi- 
nosity bins and for early- and late-type galaxies separately. 
To compare these projected correlation functions with the 
2dFGRS results from Norberg et al. (2002a), we estimate 
w p (r p ) using volume-limited samples with the same redshift 
and magnitude selection criteria as those adopted by Nor- 
berg et al. (2002a). For the MSB mocks (which use a stack 
of 6 x 6 x 6 Lioo boxes), however, these w p (r p ) reveal a 
systematic 'break' at r p ~ 10fe _1 Mpc. As we have shown 
in Section 4.2, this owes to the fact that, because of the 
small box-size of the Lioo simulation, the 2PCF is too small 
on large scales (see Figs 6 and 7). We can circumvent this 
problem by using MGRSs from the the MB set, in which the 
stack of 6 x 6 x 6 Lioo boxes is replaced by a stack of 2 x 2 x 2 
L300 boxes. However, these MGRSs are only complete down 
to Mb j — 51og/i ~ — 18.4 and can therefore only be used for 
galaxies brighter than this. 

The upper panels of Fig. 13 plot w p (r p ) for different 
magnitude bins and for early- and late-type galaxies sep- 
arately. Except for the faintest magnitude bin, these pro- 
jected correlation functions are obtained from MGRS in 
the MB set. Results for the magnitude bin with —17.5 > 
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Figure 14. The correlation lengths, ro, and slopes, 7, of the power-laws that best fit the projected correlation functions over the range 
2 < r p < 15 h~ 1 Mpc (solid squares). The results for the 2 faintest luminosity bins are based on the mean and variance of the sample 
of 8 MSB mocks, while results for the other bins are based on the mean and variance of the sample of 4 MB mocks. Open circles with 
errorbars correspond to the 2dFGRS data of Norberg et al. (2002a), and are shown for comparison. Except for a systematic overestimate 
of the correlation lengths, the cause of which has been discussed in Section 4.2, there is good agreement between our MGRSs and the 
2dFGRS. 



Mb j — 51ogfo > —18.5 (solid lines) are obtained from the 
MSB set. As discussed in Paper II, the projection signifi- 
cantly washes out the features in the real-space 2PCFs at 
~ 2 h~ 1 Mpc, and the projected 2PCFs better resemble a 
power-law. The exception is the w p (r p ) for the faintest sub- 
sample of galaxies, where the 'break' mentioned above is 
clearly visible. To highlight the luminosity and type depen- 
dence of w p (r p ), the lower panels of Fig. 13 plot the ratios of 
w p (r p ) to that of a reference sample defined as all (early-type 
plus late-type) galaxies with —19.5 > Mb, — 51ogft > —20.5. 
For a given luminosity, the correlation amplitude is higher, 
and the slope is steeper, for early-type galaxies than for late- 
type galaxies. Significant changes in the slope (and thus 
deviations from a perfect power-law) occur at separations 
r p ~ 2 ft _1 Mpc, which is at least qualitatively in agreement 
with recent results from the SDSS (Zehavi et al. 2003). 

In order to facilitate a more direct comparison with 
the 2dFGRS data, we fit a single power-law relation of 
the form (28) to these w p (r p ) over the range 2 /i _1 Mpc < 
r p < 15 h^Mpc. This range is also adopted by Norberg et 
al. (2002a) when fitting the projected 2PCFs obtained from 
the 2dFGRS. Fig. 14 plots the real-space correlation lengths 



r and the slopes 7 thus obtained as function of luminosity 
and galaxy type. The agreement between our MGRSs and 
the 2dFGRS is acceptable. The slight but systematic over- 
estimate of ro is due to the effects discussed in Section 4.2. 

We now turn to a comparison of the projected correla- 
tion function for the entire, flux limited surveys. The upper- 
left panel of Fig. 15 compares the w p (r p ) obtained from our 
8 MSB and 4 MB MGRSs with that of the 2dFGRS obtained 
by H03. The projected correlation functions from our MSBs 
and MBs agree well with each other (i.e., the 1-cr errorbars 
overlap), and, at r p <; 3/i _1 Mpc, with the 2dFGRS results. 
Note that at r p <; Wh" 1 Mpc the w p (r p ) obtained from 
the MB mocks is slightly larger than that obtained from the 
MSB mocks, again due to the effects discussed in Section 4.2. 

At large scales, w p (r p ) is predominantly sensitive to the 
halo occupation numbers (N(M)) and virtually independent 
of the second moment of P(N\M) or of details regarding the 
spatial distribution of satellite galaxies. The good agreement 
at large scales among different MGRSs and with the observa- 
tions, therefore strongly supports our CLF and it shows that 
any 'cosmic variance' among the different MGRSs has only 
a relatively small impact on w p (r p ). On small scales, how- 



© 0000 RAS, MNRAS 000, 000-000 



18 Yang, Mo, Jing, van den Bosch & Chu 




0.1 1 10 0.1 1 10 

r p (h-'Mpc) s (h-'Mpc) 




10 20 30 40 0.1 1 10 

s (h-'Mpc) r p (h-'Mpc) 



Figure 15. The projected correlation function w p (r p ) (top-left panel), the redshift-space correlation function £(s) (top-right), the 
quadrupolc-to-monopole ratio q(s) (bottom- left), and the PVDs (bottom-right) for the samples of MSB (solid lines) and MB (dashed 
lines) surveys. Error bars, which are similar for MB and MSB results, are only shown for the MSB results for clarity. These errorbars are 
based on the variance of the 8 MSB surveys. The open circles with errorbars correspond to the 2dFGRS results obtained by Hawkins et 
al. (2003), and are shown for comparison. Note that the MSBs and MBs give approximately the same results, but that there are marked 
differences between model predictions and observations. Note also that the model errorbars are in general larger than the difference in 
the mean between MB and MSB results, implying that these errorbars are statistical. 



ever, the MGRSs reveal more correlation power (by about a 
factor 2) than observed. On such scales, w p (r p ) is sensitive 
to our assumptions about the second moment of P(N\M) 
and, to a lesser degree, the spatial distribution of satellite 
galaxies. We shall return to this small-scale mismatch and 
its implications in Section 6 below. 



Rather than projecting £(r p ,ir), one may also average 
£(r p ,7r) along constant s — yjr p + it 2 , yielding the redshift- 
space 2PCFs £(s). The upper-right panel of Fig. 15 plots 
£(s) obtained from our MGRSs, compared to the 2dFGRS 
results from H03. We find a similar behavior as with the pro- 
jected correlation function; the 8 MSBs and 4 MBs agree 
quite well with each other and with the observations at 
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Figure 16. Same as Fig. 15 except that here we compare the results for the MSB sample with those of three alternative MGRSs in 
which we have modified the CLF to yield cluster mass-to-light ratios of (M/L) c \ = 1000ft (M/L)q (dotted lines), in which we adopt a 
velocity bias of fe ve i = 0.6 (dashed lines), and in which we adopt a cosmology with erg = 0.75 (dot-dashed lines). All results correspond 
to the mean of the entire sample of 8 MSB mock surveys. For clarity, no errorbars are plotted here, but they are similar to those shown 
in Fig. 15. Note that both the (M/L) c \ = lOOO/i (M/L)q model and the erg = 0.75 model are in good agreement with the observational 
data. 



s <; 6ft -1 Mpc. At smaller redshift-space separations, how- 
ever, the MGRSs slightly overpredict the correlation power. 
Note that the MB samples predict higher £(s) on small scales 
than the MSB samples. This difference comes from the fact 
that the MB samples are incomplete for galaxies fainter than 
Mb , — 51ogh = —18.4. To test this we construct a mock 
survey from the MSB sample, but only accepting galaxies 
brighter than this. This yields a £ (s) in excellent agreement 



with that of the MB samples over all scales. Thus, although 
the use of only large-box simulations can results in system- 
atic errors on small scale, the use of small-box simulations in 
the MSB samples does not cause any signicifant, systematic 
errors on large scale. 
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5.2 Redshift Space Distortions 

We now turn to a comparison of the detailed shape of 
£(r p ,7r). In particular, we focus on the distortions with re- 
spect to the real-space correlation function £(r) induced by 
the peculiar velocities of galaxies. 

The two-dimensional correlation function £(r p ,n) is of- 
ten modeled as a convolution of the real-space 2PCF £(r) 
and the conditional distribution function f(vn\r): 



1 + ^(r 



1 + Siy/rj + fr-w/Ha)*) 



f(vi2\r)dvi2 (29) 



(Peebles 1980). Here V12 corresponds to the pairwise peculiar 
velocity along the line-of-sight and r corresponds to the real- 
space separation. It is standard practice to assume an ex- 
ponential form for /(«i2|r) and to ignore its dependence on 
separation r (cf., Davis & Peebles 1983; Mo, Jing & Borner 
1993, 1997; Fisher et al. 1994; Marzke et al. 1995; Guzzo 
et al. 1997; Jing, Borner & Suto 2002; Zehavi et al. 2002). 
However, as we have shown in Section 4, the exponential 
form is only adequate at small separations, and the PVD 
varies quite strongly with separation. Furthermore, equa- 
tion (29) is only valid for an isotropic velocity field in the 
limit where the probability of a real-space pair separation 
r is independent of the probability of an associated relative 
velocity v\2- Although perhaps a reasonable approximation 
on small, highly non-linear, scales, it is certainly not valid in 
linear theory where the velocity and density fields are tightly 
coupled. In an attempt to partially correct for this, one of- 
ten assumes that /(V12) is the probability distribution for 
the relative velocity about the mean. Using the self-similar 
infall model, this mean pairwise peculiar velocity, (W12), is 
modeled as 



{vi2)(r) 



-H F 



y 



1 + (r/r ) 2 



(30) 



(Davis & Peebles 1977) with y = \tt — vi2/Ho\ the separation 
in real-space along the line-of-sight. F — corresponds to a 
Universe without any flow other than the Hubble expansion, 
while F = 1 corresponds to stable clustering. Given the 
fairly ad hoc nature of this model, and the strong sensitivity 
to the uncertain value of F (Davis & Peebles 1983), great 
care is required when interpreting any results based on this 
model. 

A more robust model is based on linear theory and di- 
rectly modeling the infall velocities around density pertur- 
bations. Following Kaiser (1987) and Hamilton (1992) one 
can write the observed correlation function on linear scales 
as 

£iin(7V,7r) =£o(s)Po(m)+&(s)P 2 (m)+£4(s)P4(m)- (31) 

Here Vi(fi) is the I th Legendre polynomial, and is the 
cosine of the angle between the line-of-sight and the redshift- 
space separation s. According to linear perturbation theory 
the angular moments can be written as 



6(*) 



f + ^)[C(r)-?(r)] 



4fT 
7 



(32) 



(33) 



t (.\ 8(32 
with 



t{r) + \ti{r) - 7 -i(r) 



£M = ^3 I £,{r')r' 2 dr' , 



and 



£(r')r' 4 dr' 



(34) 



(35) 



(36) 



Given a value for (3 and the real-space correlation function, 
which can be obtained from £(r p ,n) via the projected cor- 
relation function w p (r p ), equation (31) yields a model for 
f (r p , it) on linear scales that takes proper account of the cou- 
pling between the density and velocity fields. To model the 
non-linear virialized motions of galaxies within dark matter 
haloes, one convolves this £iin(r p ,7r) with the distribution 
function of pairwise peculiar velocities f(vi2\r). 

1 + C(r p ,7r) = 

/OO 
[1 + £lin(r p , 7T - V 12 /H())] /(ui2|r) df 12 
- OO 



(37) 



Thus, by modeling £(r p ,iv) one can hope to get both 
an estimate of f3 as well as information regarding the pair- 
wise peculiar velocity distribution. We follow H03, and as- 
sume that the real-space 2PCF is a pure power-law, £(r) = 
(r/ro) ', and that /(vi2|r) is an exponential that is inde- 
pendent of the real-space separation r: 



f(vi2\r) = f(vi 2 ) 



1 



y/2, 



-exp 



C12 



%/2| 



Vl2 



C12 



(38) 



Using a simple \ minimization technique, we fit these mod- 
els, described by the four parameters /3, o~i2, ro, and 7, to 
the £(r p ,n) in each of our 8 MSB and 4 MB MGRSs. The 

x 2 



X 



l0g[l-K]model-l0g[l+£] 

data 



51 V log [1 + ^ + AC] data 



log[l + £ - AC] data 



,(39) 



where the summation is over the £(r p ,n) data grid with 
the restriction 8k' 1 Mpc < s < 20k' 1 Mpc (see H03) and 
A£(r p , 7r) is the rms of £(r p , n) determined from each of our 
8 MSB MGRSs (or of our 4 MB MGRSs). The averages 
(over the 8 or the 4 MGRSs) of the best-fit values for /3, 
C12, 7*o, and 7, along with the variances among different 
samples, are listed in the first two lines of Table 1. These 
should be compared with the values listed in the last line, 
which correspond to the best-fit values obtained from the 
2dFGRS by H03 using exactly the same method. As one 
can see, the best-fit values for f3 and correlation lengths of 
the MSB and the MB sample agree with each other and with 
the 2dFGRS value at better than the la level. On the other 
hand, the discrepancies regarding 7 and (T12 are significant, 
both of which are significantly higher in our MGRSs than 
in the 2dFGRS. 

In order to investigate these discrepancies in more detail 
we compute two statistics of the redshift space distortions 
which we compare to the 2dFGRS. As above, we take great 
care in using exactly the same method and assumptions as 
H03. Therefore, even if some aspects of the model are ques- 
tionable, this allows a meaningful comparison of our results 
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Table 1. Best fit parameters. 



Survey 


P 




7 


C"12 


(i) 


(2) 


(3) 


(4) 


(5) 


MSBs 


0.52 ±0.05 


5.78 ±0.23 


1.99 ± 0.03 


687 ± 37 


MBs 


0.56 ±0.06 


5.93 ±0.27 


1.95 ± 0.03 


732 ± 33 


(M/L) cl 


0.52 ±0.08 


4.96 ±0.21 


1.88 ± 0.05 


487 ± 44 




0.47 ± 0.06 


5.99 ±0.19 


2.00 ± 0.06 


497 ± 47 




0.51 ±0.06 


5.19 ±0.17 


1.91 ± 0.05 


505 ± 25 


2dFGRS 


0.49 ± 0.09 


5.80 ±0.25 


1.78 ± 0.06 


514 ±31 



The values of /3, ro (in h 1 Mpc), 7, and a\-i (in kms x ) that 
best fit the £(r p , 7r) for %hr x Mpc < s < 20ft~ 1 Mpc for a number 
of different MGRSs. Note that 4 MGRSs are used for MBs, while 
8 MGRSs are used for all other cases. The quoted values are the 
mean and la variance of these MGRSs. The MGRSs denoted by 
l (M/L) cl ' is similar to the MGRSs in the MSB set, except that 
here the CLF is constrained to mass-to-light ratios for clusters of 
(M/L) cl = 1000/1 (M/L) , rather than (M/L) d = 500/i (M/L)q 
as in MSB (see Section 6.3). The MGRSs denoted by '£> vol ' is 
similar except for a velocity bias of fe ve i = <r ga i/(TDM = 0.6 (see 
Section 6.2). The MGRSs denoted by '<rg' is also similar except 
that it adopts a flat ACDM cosmology with erg = 0.75 rather than 
0.9 (see Section 6.4). The final line lists the best-fit parameters 
obtained by Hawkins et al. (2003) by fitting the £()>,7r) obtained 
from the 2dFGRS. Note that the errors in Hawkins et al. are 
estimated from the spread of 22 Mock samples. 



ing x 2 m a number of independent r p binsll. The results 
are shown in the lower-right panel of Fig. 15. Whereas the 
2dFGRS reveals a o\2(r p ) that is almost constant with ra- 
dius at about 500-600 kms - , our MGRSs reveal a strong 
increase from (J12 ~ 600 kms -1 at r p — 0.1/i _1 Mpc to 
(J12 ~ 900km s _1 at r p = 1.0ft -1 Mpc, followed by a decrease 
to CT12 ~ 500 kms -1 at r p — 10/i -1 Mpc. Thus, at around 
lfr 1 Mpc, our MGRSs dramatically overestimate the PVD. 
Although there is a non-negligible amount of scatter among 
the different mock surveys, reflecting the extreme sensitivity 
of the PVDs to the few richest systems in the survey, the 
variance among the 8 (4) MGRSs is small compared to the 
discrepancy. 

As shown by Peacock et al. (2001) the best-fit values of 
(J12 and [3 are highly degenerate. We have tested the impact 
of this degeneracy on our ai2(r p ) by repeating the same ex- 
ercise using a value for /3 that is 0.1 larger (smaller) than 
the values listed in Table 1. This leads to an increase (de- 
crease) of <Ti2(r p ) of the order of 5 percent (20 percent) at 
projected radii of l/i _1 Mpc (10ft _1 Mpc). Given that our 
MGRSs overpredict the PVD at r p = 1ft -1 Mpc by about 70 
percent, it is clear that this discrepancy is not a reflection 
of the /3-ai2 degeneracy. Thus, the standard ACDM model 
seems to have a severe problem in matching the observed 
PVDs on intermediate scales. 



with those obtained by H03. First of all, we compute the 
modified quadrupole-to-monopole ratio 



q(s) = 



where £;(s) is given by 



(40) 



(41) 



The lower-left panel of Fig. 15 plots q(s) for MSBs and MBs 
together with the 2dFGRS results (open circles with error- 
bars). Although the MGRSs reveal the same overall behav- 
ior as the 2dFGRS data, and are mutually consistent, they 
systematically overpredict q(s). On small scales, where ran- 
dom peculiar velocities cause a rapid increase of q(s), this 
indicates that the virialized motions in our mock surveys are 
larger than observed (see also below). On large scales, where 
q(s) asymptotes to the linear theory value of 



q(s) = 



l + f/3+i/? 2 



(42) 



this might indicate that the value of /3 inherent to our 
MGRSs is too small compared to the real Universe. On the 
other hand, Cole, Fisher & Weinberg (1994) have shown that 
non-linear, small scale power can affect q(s) out to fairly 
large separations. Therefore the systematic overestimate of 
q at large s may simply be a reflection of the random pecu- 
liar velocities being too large, rather than an inconsistency 
regarding the value of /3. 

The second statistic that we use to compare the redshift 
space distortions in our MGRSs with those of the 2dFGRS 
are the PVDs, o"i2(t>), as a function of projected radius, 
r p . Following H03, we keep ro, 7 and (3 fixed at the 'global' 
values listed in Table 1 and determine 012 (r p ) by minimiz- 



6 TOWARDS A SELF-CONSISTENT MODEL 
FOR LARGE SCALE STRUCTURE 

Our MGRSs, based on a flat ACDM concordance cosmol- 
ogy with £l m = 0.3 and og = 0.9, and on a CLF that is 
required to yield cluster mass-to-light ratios of (M/L) c \ — 
500/i (M/L) Q , reveals clustering statistics that are overall in 
reasonable agreement with the data from the 2dFGRS. Nev- 
ertheless, two discrepancies have come to light: the MGRSs 
predict too much power on small scales and PVDs that are 
too high. We now investigate possible ways to alleviate these 
discrepancies. 



6.1 Halo occupation models 

The discrepancies between our MGRS and the 2dFGRS re- 
sults might indicate a problem with our halo occupation 
models. Although the CLF is fairly well constrained by the 
observed luminosity function and the observed luminosity 
dependence of the correlation lengths (see Papers I and If), 
we have made additional assumptions regarding the second 
moments of the halo occupation number distributions and 
regarding the distribution of galaxies within individual dark 
matter haloes. 

As we have shown in Section 4.2 the real space cor- 
relation function depends only very weakly on our method 
of distributing satellite galaxies within dark matter haloes 
(cf. Fig. 8). We have verified, using a number of tests, that 
modifications of the spatial distribution of satellite galaxies 
within dark matter haloes have no significant influence on 



II Note that the PVDs thus obtained are a kind of average of the 
true PVD along the line-of-sight. Therefore, these PVDs should 
not be compared directly to the true PVD shown in Fig. 9. 
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w p (r p ) or on <Ti2(r p ). Therefore, none of the discrepancies 
mentioned above can be attributed to errors in our satellite 
model. 

Our results are more susceptible to changes in the sec- 
ond moment of the halo occupation number distributions. 
On small scales, £(r) scales with the average number of 
galaxy pairs in individual haloes {^N(N — 1)). Therefore, 
one can decrease the power on small scales, to bring our 
w p (r p ) in better agreement with observations, by lowering 
the second moments of our halo occupation distributions. 
However, our distribution (16) is already the narrowest dis- 
tribution possible, and modifying the second moment of the 
halo occupation distributions can therefore only aggravate 
the discrepancies on small scales. 

6.2 Velocity Bias 

A seemingly obvious explanation for the too high PVDs is 
that the peculiar velocities of galaxies are biased with re- 
spect to the dark matter. We define the velocity bias (some- 
times called 'dynamical' bias) as & vc i = cr ga i/crDM, with 
o-gai and (7dm the peculiar velocity dispersions of (satellite) 
galaxies and dark matter particles in a given halo, respec- 
tively. Note that in our fiducial MGRSs we adopt fe vo i = 1 
(i.e. no velocity bias). Fig. 16 shows w p (r p ), £(s), q(s) and 
ci2(r p ) for MGRSs (with the MSB configuration of simula- 
tion boxes) in which fe vc i = 0.6; i.e., the velocity dispersion 
of satellite galaxies is only 60 percent of that of the dark 
matter particles in the same halo (dashed lines). With such 
a pronounced velocity bias, both q(s) and the PVDs, as well 
as (3 and the global value of a\2 listed in Table 1, are all 
consistent with the 2dFGRS results. 

In two recent papers, Berlind et al. (2003) and 
Yoshikawa et al. (2003) measured the velocity bias of 'galax- 
ies' in a smoothed particle hydrodynamics (SPH) simulation 
and found that b ve i decreases from 6 ve i ~ 0.9-1.0 for haloes 
with M ~ 3 x 10 14 /i _1 M to & vel ~ 0.6-0.8 for haloes 
with M~3x 10 12 /i _1 M . Thus, although simulations pre- 
dict that low-mass haloes might have values for the velocity 
bias as low as b vo i ~ 0.6, required here to bring our PVDs 
in agreement with observations, the PVD is dominated by 
galaxies in massive haloes for which these same simulations 
apparently predict close to b vc \ = 1 (i.e., no velocity bias). 
Furthermore, the introduction of velocity bias cannot solve 
the excess power on small scales. After all, the real-space cor- 
relation function is independent of & ve i such that the discrep- 
ancies regarding w p (r p ) on small scales remain (see upper- 
left panel of Fig. 16). To make matters worse, reducing the 
peculiar velocities of galaxies inside dark matter haloes, in- 
creases the small scale power in redshift space. This means, 
that the redshift space correlation function £(s) becomes 
actually more discrepant with the 2dFGRS data (see upper- 
right panel of Fig. 16). Therefore, although some amount 
of velocity bias might be expected, we do not consider it a 
viable solution for the problems mentioned above. 

6.3 Cluster mass-to-light ratios 

Because the PVD is a pair weighted statistic it is extremely 
sensitive to the few richest systems in the sample (i.e., Mo, 
Jing & Borner 1993, 1997; Zurek et al. 1994; Marzke et 



al. 1995; Somerville, Primack & Nolthenius 1997). The fact 
that the PVDs in our MGRSs are too large compared with 
observations therefore might indicate that either there are 
too many clusters of galaxies in our mock surveys (see Sec- 
tion 6.4 below), or that these clusters contain too many 
galaxies. 

Our CLF was constructed under the constraint that the 
average mass-to- light ratio of haloes with M > 10 14 /i _1 Mq 
is equal to (M/L) c i = 500h (M/L)© (in the photometric 
bj band). This value is motivated by the average mass-to- 
light ratio of clusters obtained by Fukugita, Hogan & Peebles 
(1998). To reduce the number of galaxies per cluster we now 
set (M/L) c i = 1000/i(M/L)q and repeat the entire exercise: 
we first use the method described in Section 2 to compute 
the parameters of the new conditional luminosity function. 
This CLF is used to construct new MGRSs (using the same 
configuration of simulation boxes as in MSB), from which 
we determine the same statistics as before. 

The results are listed in Table 1 and shown as dotted 
lines in Fig. 16. Clearly, increasing the mass-to-light ratio 
of clusters lowers q(s) and <7i2(r p ), bringing them in better 
agreement with the 2dFGRS results. Although the PVDs are 
still somewhat too high, especially at around ~ OAh^ 1 Mpc, 
the extent of this discrepancy is similar to its 1-a variance 
of the 8 MGRSs, indicating that this remaining difference 
is consistent with 'cosmic variance'. As can be seen from 
Table 1, both (5 and 7 are now in much better agreement 
with the 2dFGRS. In addition, the reduction of the number 
of galaxies in clusters significantly reduces w p (r p ) at small 
projected separations (see Fig. 16), bringing it in good agree- 
ment with the observations . A similar reduction of small 
scale power is also evident in £(s). Thus, these particular 
MGRSs have clustering characteristics that are overall in 
good agreement with the 2dFGRS results. The question is 
therefore whether or not such a high mass-to-light ratio for 
clusters of galaxies is compatible with observations. 

The cluster mass-to-light ratios quoted by Fukugita et 
al. (1998) are (450 ± 100)/i(M/L)© in the B-band based on 
X-ray and velocity-dispersion data. Taking these numbers 
at face value, a cluster mass-to-light ratio of (M/L) c \ — 
lOOO/i (M/L)q is ruled out at the 5a level. Using a variety 
of methods to estimate cluster masses, Bahcall et al. (2000) 
obtained (M/L) B = (330±77)/i(M/L)©, which is consistent 
with the results of Carlberg et al. (1996), {M/L) B = (363 ± 
65)/i(M/L)q based on galaxy kinematics in clusters. Taking 
the average of these two measurements yields (M/Lb) c \ = 
(350 ± 70)h (M/L) Q , which rules out the cluster mass-to- 
light ratio required to match the clustering power on small 
scales at more than 7a. Thus, unless the cluster mass-to- 
light ratios obtained from current observations are seriously 
in error, increasing the average cluster mass-to- light ratio to 
(M/L) c i ~ 1000ft (M/L) Q does not seem a viable solution 
for the problems at hand. 



** Note that the correlation amplitude predicted with (M/L) c ; = 
lOOO/i (M/L)q is slightly lower than the observed amplitude, be- 
cause in this model more galaxies are assigned to small haloes in 
order to match the observed luminosity function. Since the er- 
rorbars on the observed correlation lengths arc relatively large, 
the model tends to compromise the accuracy of the fit to the 
correlation lengths. 
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6.4 Power spectrum normalization 

Rather than lowering the average number of galaxies per 
cluster, we may also hope to lower the PVDs by reducing the 
actual number of clusters. As we have shown in Section 5.2, 
our results, and thus the number density of rich clusters, is 
robust against cosmic variance. Therefore, a lower number 
density of clusters implies a different cosmological model. It 
is well known that the abundance of (rich) clusters is ex- 
tremely sensitive to the power-spectrum normalization pa- 
rameter as- The too high PVDs could thus be indicative of 
a too high value for as- 

We therefore wish to compute the PVDs in a ACDM 
cosmology with identical cosmological parameters as before, 
except that as = 0.75 rather than 0.9. Note that the choice 
of a 8 = 0.75 is somewhat arbitrary, but it does represent 
a compromise between the constraints on the value of as 
from various observations and the low value required by our 
results on the PVDs (see below). In principle, constructing 
new MGRSs for a different cosmology requires new iV-body 
simulations of the dark matter distribution. This, however, 
is computationally too expensive, which is why we use an 
approximate method instead. First, we compute the new 
best-fit parameters of the CLF for this cosmology, again 
demanding that (M/L) c \ = 500ft (M/L)q . Next we populate 
the dark matter haloes in our as = 0.9 simulation boxes with 
galaxies according to this new CLF. Finally, we construct 
a new sample of 8 MSB MGRSs, in which we weigh each 
galaxy in a halo of mass M by 



n(M\a s = 0.75) 
n(M\a 8 = 0.9) 



(43) 



with n(M) the number density of dark matter haloes of mass 
M. This, to first order, mimics the effect of lowering as 
on the halo mass function, and so should be a reasonable 
approximation on small scales where the clustering proper- 
ties are determined by the galaxy distribution in individual 
haloes^. The results for these MGRSs are shown as dot- 
dashed lines in Fig. 16. As one can see, the agreement with 
observational results in this model is much better than in 
the standard ACDM model. These results suggest that a 
ACDM model with as ~ 0.75 may match all the observa- 
tional results obtained from the 2dFGRS. Unfortunately, in 
the absence of proper A^-body simulations for this model, 
it is impossible to make a more detailed comparison with 
observation. 

The question is, of course, whether such a low as is 
compatible with other independent observations. Currently, 
the value of as is constrained mainly by three types of 
observations: weak lensing surveys, cluster mass functions, 
and anisotropy in the cosmic microwave background. Re- 
cent cluster abundance analyses give values of as (assuming 
Q. m — 0.3) in a wide range, from 0.6 to 1 (e.g. Borgani 
et al. 2001; Seljak 2002; Viana, Nichol & Liddle 2002; Pen 
1998; Fan & Bahcall 1998; Reiprich & Bohringer 2002). Re- 
sults from weak lensing surveys are equally uncertain, with 

ft Note that for haloes with a given mass, the concentration 
parameters are smaller in the as = 0.75 model than they are 
in the standard ACDM model. This change of concentration is 
taken into account in our analyses, eventhough its effect is al- 
most negligible. 



as spanning the range 0.7 to ~ 1 (e.g., Jarvis et al. 2003; 
Hoekstra, Yee & Gladders 2002; Refregier et al. 2002; Bacon 
ct al. 2003). Thus, our preferred value, as = 0.75, is consis- 
tent with these observations. At the moment, the most strin- 
gent constraint on the value of as is from WMAP (Spergel 
et al. 2003): a 8 = 0.84 ± 0.04 (la error). Even taking this 
result at face value, one cannot rule out as = 0.75 with 
any high confidence. Thus, there is no strong observational 
evidence to argue against a ACDM model with as = 0.75. 
Furthermore, as discussed in van den Bosch, Mo & Yang 
(2003), a value of as as low as 0.75 can also help to alleviate 
several problems in current models of galaxies formation, 
such as the ones in connection to the Tully-Fisher relation 
and to the rotation curve shapes of low-surface brightness 
galaxies. Our results presented here give additional support 
for a relatively low value of as- 



7 CONCLUSIONS 

In this paper, we have used realistic halo occupation distri- 
butions, obtained using the conditional luminosity function 
technique introduced by Yang, Mo & van den Bosch (2003) , 
to populate dark matter haloes in high-resolution simula- 
tions of the ACDM 'concordance' cosmology. The simula- 
tions follow the evolution of 512 3 dark matter particles in 
periodic boxes of 100/i -1 Mpc and 300/i _1 Mpc on a side. 
Subsequently, the dark matter haloes identified in these sim- 
ulations are populated with galaxies of different luminosity 
and different morphological type. 

We have shown that the luminosity functions and the 
correlation lengths as function of luminosity, both for the 
early- and the late-type galaxies, are in good agreement with 
observations. Since these same observations were used to 
constrain the conditional luminosity functions, which in turn 
were used to populate the dark matter haloes, this agree- 
ment shows that the halo occupation statistics obtained ana- 
lytically can be implemented reliably in A-body simulations 
to construct realistic, self-consistent, mock galaxy distribu- 
tions. We have demonstrated that the details of the spatial 
distribution of galaxies within individual dark matter haloes 
have only a very mild effect on the two-point correlation 
function, and only at real-space separations r <, 0.3fe _1 Mpc. 

The mean pairwise peculiar velocities, (W12), however, 
depend rather strongly on whether satellite galaxies (any 
galaxy in a dark matter halo other than the most luminous, 
central galaxy) are associated with random dark matter 
particles of the friends-of-friends (FOF) group, or whether 
they are assigned peculiar velocities assuming a spherical, 
isotropic velocity distribution around the central galaxy. In 
the former case, (^12), which indicates the amount of infall 
around overdensity regions, is similar to that of the dark 
matter. In the latter case, (V12) is significantly suppressed 
with respect to the dark matter. This difference indicates 
that the outer parts of the FOF-groups are not yet virial- 
ized. 

The pairwise velocity dispersions (PVDs) of the galax- 
ies are found to be significantly smaller than those of dark 
matter particles. Since the PVD is a pair weighted measure 
for the potential well in which dark matter particles (galax- 
ies) reside, this can be understood as long as the average 
number of galaxies per unit halo mass, N/M, decreases with 



© 0000 RAS, MNRAS 000, 000-000 



24 Yang, Mo, Jing, van den Bosch & Chu 



M (Jing et al. 1998). Indeed, the halo occupation numbers 
inferred from our conditional luminosity function indicate 
that N/M oc M a with a ~ -0.2. 

Stacking a number of 100ft -1 Mpc and 300ft _1 Mpc sim- 
ulation boxes allows us to construct mock galaxy redshift 
surveys (MGRSs) that are comparable to the 2dFGRS in 
terms of sky coverage, depth, and magnitude limit. For each 
of these MGRSs we estimate the two-point correlation func- 
tions £(r p ,7r). These are used to derive a number of statis- 
tics about the large scale distribution of galaxies, which we 
compare directly with the 2dFGRS results. In particular, wc 
calculate the projected 2PCFs w p (r p ) as function of lumi- 
nosity and type. The best-fit power-law slope and correlation 
lengths of these projected correlation functions are found in 
good agreement with the 2dFGRS results obtained by Nor- 
berg et al. (2002a). In addition, we also compute w p (r p ) 
and the redshift space correlation function £(s) for the en- 
tire MGRSs. These are compared to the 2dFGRS results 
obtained by Hawkins et al. (2003). 

Although the agreement with the 2dFGRS data is ex- 
cellent on scales larger than ~ 3/i _1 Mpc, on smaller scales 
w p (r p ) is about a factor two larger than observed. To in- 
vestigate this in more detail, we analyzed the redshift-space 
distortions present in £(r p , ir) by computing the quadrupole- 
to-monopole ratios q(s) and the pairwise velocity disper- 
sions (7i2 A comparison with the results of Hawkins 
ct al. (2003) shows that the standard ACDM model over- 
predicts the clustering power on small scales by a factor of 
about two, and the PVDs by about 350 km s" 1 . After exam- 
ining a variety of possibilities, we find that the only viable 
solution to these problems is to reduce the power spectrum 
amplitude, as, from 0.9 to ~ 0.75. 

No doubt, in the coming years, new results from the 
2dFGRS and the SDSS will significantly improve the data 
on the large scale distribution of galaxies. The analysis pre- 
sented here, based on the conditional luminosity function, 
will hopefully prove a useful tool to further constrain both 
galaxy formation and cosmology. In this respect, our results 
regarding constraints on as are an important illustration of 
the potential power of this approach. 
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